home *** CD-ROM | disk | FTP | other *** search
- <!DOCTYPE PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
- <html>
- <head>
- <title>Clean up your Web pages with HTML TIDY</title>
- <meta name="keywords" content=
- "HTML, validation, error correction, pretty-printing">
- <meta name="author" content="Dave Raggett <dsr@w3.org>">
- <style>
- body {
- margin-left: 10%;
- margin-right: 10%;
- font-family: sans-serif
- }
- h1 { margin-left: -8% }
- h2,h3,h4,h5,h6 { margin-left: -4% }
- pre { color: green; font-weight: bold; font-size: 80%; font-family: monospace}
- em { font-style: italic; color: rgb(0, 0, 153) }
- strong { text-transform: uppercase; font-weight: bold }
- .note {font-style: italic; color: rgb(192, 101, 101) }
- //hr {text-align: center; width: 60% }
- blockquote {
- color: navy;
- margin-left: 1%;
- margin-right: 1%;
- text-align: center;
- font-family: "Comic Sans MS", "Times New Roman", serif
- }
- table {
- font-family: sans-serif;
- font-size: 80%;
- background: rgb(255,255,153)
- }
- td {
- font-size: 80%
- }
- .people {font-family: "Lucida Calligraphy", serif}
- :link { color: rgb(0, 0, 153) }
- :visited { color: rgb(153, 0, 153) }
- :active { color: rgb(255, 0, 102) }
- :hover { color: rgb(0, 0, 255) }
- </style>
- </head>
- <body bgcolor="#FFFFE0" background="grid.gif" text="black" link=
- "navy" vlink="black" alink="red">
- <h1 align="center"><img src="tidy.gif" width="32" height="32"
- align="top" alt="icon"> Clean up your Web pages<br>
- with HTML TIDY</h1>
-
- <p align="center"><small>Copyright ⌐ 1999 <a href=
- "http://www.w3.org">W3C</a>, see <a href="tidy.c">tidy.c</a> for
- copyright notice.</small></p>
-
- <blockquote>With many thanks to <a href="http://www.hp.com">
- Hewlett Packard</a> for financial support during the development
- of this software!</blockquote>
-
- <p align="center"><b>This version 15th April 1999</b></p>
-
- <p>See the <a href="release-notes.html"><b>release notes</b></a>
- for information on recent changes.</p>
-
- <p>To get the latest version of Tidy please visit the original
- version of this page at: <a href=
- "http://www.w3.org/People/Raggett/tidy">
- http://www.w3.org/People/Raggett/tidy</a>. Courtesy of Netmind,
- you can register for email reminders when new versions of tidy
- become available.</p>
-
- <form method="GET" action=
- "http://www.netmind.com/cgi-bin/uncgi/url-mind">
- <center><input type="SUBMIT" value="Press Here to Register">
- </center>
- </form>
-
- <hr align="center" width="80%">
- <p align="center"><a href="#help">How to use Tidy</a> | <a href=
- "#download">Downloading Tidy</a> | <a href="release-notes.html">
- Release Notes</a><br>
- <a href="#quotes">Integration with other Software</a> | <a href=
- "#acks">Acknowledgements</a></p>
-
- <hr align="center" width="80%">
- <h3>Introduction to TIDY</h3>
-
- <p>When editing HTML it's easy to make mistakes. Wouldn't it be
- nice if there was a simple way to fix these mistakes
- automatically and tidy up sloppy editing into nicely layed out
- markup? Well now there is! Dave Raggett's HTML TIDY is a free
- utility for doing just that. It also works great on the
- atrociously hard to read markup generated by specialized HTML
- editors and conversion tools, and can help you identify where you
- need to pay further attention on making your pages more
- accessible to people with disabilities.</p>
-
- <p>Tidy is able to fix up a wide range of problems and to bring
- to your attention things that you need to work on yourself. Each
- item found is listed with the line number and column so that you
- can see where the problem lies in your markup. Tidy won't
- generate a cleaned up version when there are problems that it
- can't be sure of how to handle. These are logged as "errors"
- rather than "warnings".</p>
-
- <h3>Examples of TIDY at work</h3>
-
- <p>Tidy corrects the markup in a way that matches where possible
- the observed rendering in popular browsers from Netscape and
- Microsoft. Here are just a few examples of how TIDY perfects your
- HTML for you:</p>
-
- <ul>
- <li><b>Missing or mismatched end tags are detected and
- corrected</b>
-
- <pre>
- <h1>heading
- <h2>subheading</h3>
- </pre>
-
- <p>is mapped to</p>
-
- <pre>
- <h1>heading</h1>
- <h2>subheading</h2>
- </pre>
- </li>
-
- <li><b>End tags in the wrong order are corrected:</b>
-
- <pre>
- <p>here is a para <b>bold <i>bold italic</b> bold?</i> normal?
- </pre>
-
- <p>is mapped to</p>
-
- <pre>
- <p>here is a para <b>bold <i>bold italic</i> bold?</b> normal?
- </pre>
- </li>
-
- <li><b>Fixes problems with heading emphasis</b>
-
- <pre>
- <h1><i>italic heading</h1>
- <p>new paragraph
- </pre>
-
- <p>In Netscape and Internet Explorer this causes everything
- following the heading to be in the heading font size, not the
- desired effect at all!</p>
-
- <p>Tidy maps the example to</p>
-
- <pre>
- <h1><i>italic heading</i></h1>
- <p>new paragraph
- </pre>
- </li>
-
- <li><b>Recovers from mixed up tags</b>
-
- <pre>
- <i><h1>heading</h1></i>
- <p>new paragraph <b>bold text
- <p>some more bold text
- </pre>
-
- <p>Tidy maps this to</p>
-
- <pre>
- <h1><i>heading</i></h1>
- <p>new paragraph <b>bold text</b>
- <p><b>some more bold text</b>
- </pre>
- </li>
-
- <li><b>Getting the <hr> in the right place:</b>
-
- <pre>
- <h1><hr>heading</h1>
- <h2>sub<hr>heading</h2>
- </pre>
-
- <p>Tidy maps this to</p>
-
- <pre>
- <hr>
- <h1>heading</h1>
- <h2>sub</h2>
- <hr>
- <h2>heading</h2>
- </pre>
- </li>
-
- <li><b>Adding the missing "/" in end tags for anchors:</b>
-
- <pre>
- <a href="#refs">References<a>
- </pre>
-
- <p>Tidy maps this to</p>
-
- <pre>
- <a href="#refs">References</a>
- </pre>
- </li>
-
- <li><b>Perfecting lists by putting in tags missed out:</b>
-
- <pre>
- <body>
- <li>1st list item
- <li>2nd list item
- </pre>
-
- <p>is mapped to</p>
-
- <pre>
- <body>
- <ul>
- <li>1st list item</li>
- <li>2nd list item</li>
- </ul>
- </pre>
- </li>
-
- <li><b>Missing quotes around attribute values are added</b>
-
- <p>Tidy inserts quote marks around all attribute values for you.
- It can also detect when you have forgotten the closing quote
- mark, although this is something you will have to fix
- yourself.</p>
- </li>
-
- <li><b>Unknown/Proprietary attributes are reported</b>
-
- <p>Tidy has a comprehensive knowledge of the attributes defined
- in the HTML 4.0 recommendation from W3C. This often allows you to
- spot where you have mistyped an attribute or value.</p>
- </li>
-
- <li><b>Proprietary elements are recognized and reported as
- such.</b>
-
- <p>Tidy will even work out which version of HTML you are using
- and insert the appropriate DOCTYPE element, as per the W3C
- recommendations.</p>
- </li>
-
- <li><b>Tags lacking a terminating '>' are spotted</b>
-
- <p>This is something you then have to fix yourself as Tidy is
- unsure of where the > should be inserted.</p>
- </li>
- </ul>
-
- <h3>Layout style</h3>
-
- <p>You can choose which style you want Tidy to use when it
- generates the cleaned up markup: for instance whether you like
- elements to indent their contents or not.</p>
-
- <h3>Internationalization issues</h3>
-
- <p>Tidy offers you a choice of character encodings: US ASCII, ISO
- Latin-1, UTF-8 and the ISO 2022 family of 7 bit encodings. The
- full set of HTML 4.0 entities are defined. Cleaned up output uses
- HTML entity names for characters when appropriate. Otherwise
- characters outside the normal range are output as numeric
- character entities.</p>
-
- <h3>Accessibility</h3>
-
- <p>Tidy offers advice on accessibility problems for people using
- non-graphical browsers. The most common thing you will see is the
- suggestion you add a summary attribute to table elements. The
- idea is to provide a summary of the table's role and structure
- suitable for use with aural browsers.</p>
-
- <h3>Cleaning up presentational markup</h3>
-
- <p>Many tools generate HTML with an excess of FONT, NOBR and
- CENTER tags. Tidy's <em>-clean</em> option will replace them by
- style properties and rules using CSS. This makes the markup
- easier to read and maintain as well as reducing the file size!
- Tidy is expected to get smarter at this in the future.</p>
-
- <h3>Support for XML</h3>
-
- <p>XML processors compliant with W3C's XML 1.0 recommendation are
- very picky about which files they will accept. Tidy can help you
- to fix errors that cause your XML files to be rejected. Tidy
- doesn't yet recognize all XML features though, e.g. it doesn't
- yet understand CDATA sections or DTD subsets.</p>
-
- <h3>Creating Slides</h3>
-
- <p>The <em>-slides</em> option allows you to burst a single HTML
- file into a number of linked slides. Each H2 element in the input
- file is treated as delimiting the start of the next slide. The
- slides are named slide1.html, slide2.html, slide3.html etc. This
- is a relatively new feature and ideas are welcomed as to how to
- improve it. In particular, I plan to add support to the
- configuration file for setting the style sheet for slides and for
- customizing the slides via a template.</p>
-
- <p>I would be interested in hearing from anyone who can offer
- help with using Javascript for adding dynamic effects to slides,
- for instance similar to those available in Microsoft
- PowerPoint.</p>
-
- <h3>Indenting text for a better layout</h3>
-
- <pre>
- <html>
- <head>
- </head>
- <body>
- <p>
- para which has enough text to cause a line break, and so test
- the wrapping mechanism for long lines.
- </p>
- <pre>This is
- <em>genuine
- preformatted</em>
- text
- </pre>
- <ul>
- <li>
- 1st list item
- </li>
- <li>
- 2nd list item
- </li>
- </ul>
- <!-- end comment -->
- </body>
- </html>
- </pre>
-
- <p>and this is the default style:</p>
-
- <pre>
- <html>
- <head>
- </head>
- <body>
- <p>para which has enough text to cause a line break, and so test
- the wrapping mechanism for long lines.</p>
-
- <pre>This is
- <em>genuine
- preformatted</em>
- text
- </pre>
-
- <ul>
- <li>1st list item </li>
-
- <li>2nd list item</li>
- </ul>
-
- <!-- end comment -->
- </body>
- </html>
-
- </pre>
-
- <h3><a name="help">How to run tidy</a></h3>
-
- <pre>
- <font color=
- "maroon">tidy</font> <em>[[options] filename]*</em>
- </pre>
-
- <p>HTML tidy is not (yet) a windows program. If you run tidy
- without any arguments, it will just sit there waiting to read
- markup on the stdin stream. Tidy's input and output default to
- stdin and stdout respectively. Errors are written to stderr but
- can be redirected to a file with the -f <em>filename</em>
- option.</p>
-
- <p>I generally use the -m option to get tidy to update the
- original file, and if the file is particularly bad I also use the
- -f option to write the errors to a file to make it easier to
- review them. Tidy supports a small set of character encoding
- options. The default is ASCII, which makes it easy to edit markup
- in regular text editors.</p>
-
- <p>For instance:</p>
-
- <pre>
- tidy -f errs.txt -m index.html
- </pre>
-
- <p>which runs tidy on the file "index.html" updating it in place
- and writing the error messages to the file "errs.txt". Its a good
- idea to save your work before tidying it, as with all complex
- software, tidy may have bugs. If you find any please let me
- know!</p>
-
- <p>Users running in Microsoft Windows should be aware that Dos
- doesn't expand wild cards in filenames. This means that if you
- have several html files in the same directory and want to tidy
- all of them:</p>
-
- <pre>
- tidy *.html
- </pre>
-
- <p>won't work. You will see an error message: "can't open file
- *.html". Instead you need to run tidy separately on each one. I
- will look into a fix for this for a future release. A work around
- is to use the DOS <em>for</em> command, as in:</p>
-
- <pre>
- for %i in (*.html) do tidy %i
- </pre>
-
- <p>Note: in a batch file that needs to be %%i instead of %i</p>
-
- <h4>Tidy's Options</h4>
-
- <p>To get a list of available options use:</p>
-
- <pre>
- tidy -help
- </pre>
-
- <p>You should see something like this:</p>
-
- <pre>
- options for tidy vers: 14th April 1999
-
- <font color=
- "maroon">-config <em>file</em></font> read config <em>file</em>
- <font color="maroon">-indent</font> <i>or</i> <font color=
- "maroon">-i</font> indent element content
- <font color="maroon">-omit</font> <i>or</i> <font color=
- "maroon">-o</font> omit optional endtags
- <font color=
- "maroon">-wrap 72</font> wrap text at column 72 (default is 68)
- <font color="maroon">-upper</font> <i>or</i> <font color=
- "maroon">-u</font> force tags to upper case
- <font color="maroon">-clean</font> <i>or</i> <font color=
- "maroon">-c</font> replace font, nobr & center tags by CSS
- <font color=
- "maroon">-raw</font> don't o/p entities for chars 128 to 255
- <font color=
- "maroon">-ascii</font> use ASCII for output, Latin-1 for input
- <font color=
- "maroon">-latin1</font> use Latin-1 for both input and output
- <font color=
- "maroon">-utf8</font> use UTF-8 for both input and output
- <font color=
- "maroon">-iso2022</font> use ISO2022 for both input and output
- <font color="maroon">-numeric</font> <i>or</i> <font color=
- "maroon">-n</font> output numeric rather than named entities
- <font color="maroon">-modify</font> <i>or</i> <font color=
- "maroon">-m</font> to modify original files
- <font color="maroon">-errors</font> <i>or</i> <font color=
- "maroon">-e</font> show only error messages
- <font color=
- "maroon">-f <em>file</em></font> write errors to <em>file</em>
- <font color=
- "maroon">-xml</font> use this when input is in XML
- <font color=
- "maroon">-asxml</font> to convert HTML to XML
- <font color=
- "maroon">-slides</font> to burst into slides on h2 elements
- <font color=
- "maroon">-help</font> list command line options
- </pre>
-
- <p>Input and Output default to stdin/stdout respectively. Single
- letter options apart from -f may be combined as in: tidy -f
- errs.txt -imu foo.html</p>
-
- <h3><a name="config">Using a Configuration File</a></h3>
-
- <p>Tidy now supports a configuration file, and this is now much
- the most convenient way to configure Tidy. Assuming you have
- created a config file named "config.txt" (the name doesn't
- matter), you can instruct Tidy to use it via the command line
- option <tt>-config config.txt</tt>, e.g.</p>
-
- <pre>
- tidy -config config.txt file1.html file2.html
- </pre>
-
- <p>Alternatively, you can name the default config file via the
- environment variable named "HTML_TIDY". Note this should be the
- absolute path since you are likely to want to run Tidy in
- different directories.</p>
-
- <p>The following options are supported:</p>
-
- <dl>
- <dt>markup: <em>bool</em></dt>
-
- <dd>Determines whether Tidy generates a pretty printed version of
- the markup. Bool values are either <em>yes</em> or <em>no</em>.
- Note that Tidy won't generate a pretty printed version if it
- finds unknown tags, or missing trailing quotes on attribute
- values, or missing trailing '>' on tags. The default is <em>
- no</em>.</dd>
-
- <dt>wrap: <em>number</em></dt>
-
- <dd>Sets the right margin beyond which Tidy attempts to wrap
- lines so as to get them to fit within this margin. The default is
- column 66.</dd>
-
- <dt>tab-size: <em>number</em></dt>
-
- <dd>Sets the number of columns between successive tab stops. The
- default is 4. It is used to map tabs to spaces when reading
- files. Tidy never outputs files with tabs.</dd>
-
- <dt>indent: <em>no, yes</em> or <em>auto</em></dt>
-
- <dd>If set to <em>yes</em> Tidy will indent block-level tags. The
- default is <em>no</em>. If set to <em>auto</em> Tidy will decide
- whether or not to indent the content of tags such as h1-h6, li,
- or p depending on whether or not the content includes a
- block-level element.</dd>
-
- <dt>indent-spaces: <em>number</em></dt>
-
- <dd>Sets the number of spaces to indent content when indentation
- is enabled. The default is 2 spaces.</dd>
-
- <dt>hide-endtags: <em>bool</em></dt>
-
- <dd>If set to <em>yes</em>, optional end-tags will be omitted
- when generating the pretty printed markup. This option is ignored
- if you are outputting to XML. The default is <em>no</em>.</dd>
-
- <dt>input-xml: <em>bool</em></dt>
-
- <dd>If set to <em>yes</em>, Tidy will use the XML parser rather
- than the error correcting HTML parser. The default is <em>
- no</em>.</dd>
-
- <dt>output-xml: <em>bool</em></dt>
-
- <dd>If set to <em>yes</em>, Tidy will use generate the pretty
- printed output writing it as well-formed XML. Any entities not
- defined in XML 1.0 will be written as numeric entities to allow
- them to be parsed by an XML parser. The default is <em>
- no</em>.</dd>
-
- <dt>output-xhtml: <em>bool</em></dt>
-
- <dd>If set to <em>yes</em>, Tidy will use generate the pretty
- printed output writing it as extensible HTML. The default is <em>
- no</em>. This option causes Tidy to set the doctype and default
- namespace as appropriate to XHTML. If a doctype or namespace is
- given they will checked for consistency with the content of the
- document. In the case of an inconsistency, the corrected values
- will appear in the output. For XHTML, entities can be written as
- named or numeric entities according to the value of the
- "numeric-entities" property.</dd>
-
- <dt>char-encoding: <em>raw, ascii, latin1, utf8</em> or <em>
- iso2022</em></dt>
-
- <dd>Determines how Tidy interprets character streams. For <em>
- ascii</em>, Tidy will accept Latin-1 character values, but will
- use entities for all characters whose value > 127. For <em>
- raw</em>, Tidy will output values above 127 without translating
- them into entities. For <em>latin1</em> characters above 255 will
- be written as entities. For <em>utf8</em>, Tidy assumes that both
- input and output is encoded as UTF-8. You can use <em>
- iso2022</em> for files encoded using the ISO2022 family of
- encodings e.g. ISO 2022-JP. The default is <em>ascii</em></dd>
-
- <dt>numeric-entities: <em>bool</em></dt>
-
- <dd>Causes entities other than the basic XML 1.0 named entities
- to be written in the numeric rather than the named entity form.
- The default is <em>no</em></dd>
-
- <dt>quote-marks: <em>bool</em></dt>
-
- <dd>If set, this causes " characters to be written out as
- &quot; as is preferred by some editing environments. The
- default is <em>no</em>.</dd>
-
- <dt>quote-nbsp: <em>bool</em></dt>
-
- <dd>If set, this causes non-breaking space characters to be
- written out as enities. The default is <em>yes</em>.</dd>
-
- <dt>quote-marks: <em>bool</em></dt>
-
- <dd>If set, this causes unadorned & characters to be
- written out as &. The default is <em>yes</em>.</dd>
-
- <dt>wrap-script-literals: <em>bool</em></dt>
-
- <dd>If set, this allows lines to be wrapped within string
- literals that appear in script attributes. The default is <em>
- no</em>. The example shows how Tidy wraps a really really long
- script string literal inserting a backslash character before the
- linebreak:
-
- <pre>
- <a href="somewhere.html" onmouseover="document.status = '...some \
- really, really, really, really, really, really, really, really, \
- really, really long string..';">test</a>
- </pre>
- </dd>
-
- <dt>break-before-br: <em>bool</em></dt>
-
- <dd>If set, Tidy will output a line break before each <br>
- element. The default is <em>no</em>.</dd>
-
- <dt>uppercase-tags: <em>bool</em></dt>
-
- <dd>Causes tag names to be output in upper case. The default is
- <em>no</em>.</dd>
-
- <dt>uppercase-attributes: <em>bool</em></dt>
-
- <dd>Causes attribute names to be output in upper case. The
- default is <em>no</em>.</dd>
-
- <dt>clean: <em>bool</em></dt>
-
- <dd>If set, causes Tidy to strip out surplus presentational tags
- and attributes replacing them by style rules and structural
- markup as appropriate. It works well on the html saved from
- Microsoft Office'97. I hope to work on cleaning up after Office
- 2000 in a future release. The default is <em>no</em>.</dd>
-
- <dt>write-back: <em>bool</em></dt>
-
- <dd>If set, Tidy will write back the tidied markup to the same
- file it read from. The default is <em>no</em>. You are advised to
- keep copies of important files before tidying them as on rare
- occasions the result may not always be what you expect.</dd>
-
- <dd>error-file: <em>filename</em></dd>
-
- <dd>Writes errors and warnings to the specified file rather than
- to stderr.</dd>
-
- <dt>show-warnings: <em>bool</em></dt>
-
- <dd>If set to no, warnings are suppressed. This can be useful
- when a few errors are hidden in a flurry of warnings. The default
- is <em>yes</em>.</dd>
-
- <dt>split: <em>bool</em></dt>
-
- <dd>If set to <em>yes</em> Tidy will use the input file to create
- a sequence of slides, splitting the markup prior to each
- successive <h2>. You can see an example of the results in a
- <a href="http://www.w3.org/Talks/1999/03/23-stockholm-xhtml">
- recent talk I made on XHTML</a>. The slides are written to
- "slide1.html", "slide2.html" etc. The default is <em>
- no</em>.</dd>
-
- <dt>new-inline-tags: <em>tag1, tag2, tag3</em></dt>
-
- <dd>Use this to declare new inline tags. The option takes a space
- or comma separated list of tag names. Unless you declare new
- tags, Tidy will refuse to generate a tidied file if the input
- includes previously unknown tags.</dd>
-
- <dt>new-blocklevel-tags: <em>tag1, tag2, tag3</em></dt>
-
- <dd>Use this to declare new block-level tags. The option takes a
- space or comma separated list of tag names. Unless you declare
- new tags, Tidy will refuse to generate a tidied file if the input
- includes previously unknown tags.</dd>
- </dl>
-
- <h4>Sample Config File</h4>
-
- <pre>
- // sample config file for HTML tidy
- indent: auto
- indent-spaces: 2
- wrap: 72
- markup: yes
- clean: yes
- output-xml: no
- input-xml: no
- show-warnings: yes
- numeric-entities: yes
- quote-marks: yes
- quote-nbsp: yes
- quote-ampersand: no
- break-before-br: no
- uppercase-tags: no
- uppercase-attributes: no
- output-xhtml: yes
- char-encoding: latin1
- </pre>
-
- <h3><a name="download">Downloadable Binaries</a></h3>
-
- <p class="note">If you are prepared to maintain a public URL for
- HTML Tidy compiled for a specific platform, please let me know so
- that I can add a link to your page. This will avoid the need for
- me to update this page whenever you recompile.</p>
-
- <p><b><a href="http://www.chami.com/free/html-kit/">Windows
- users</a></b>! A free graphical user interface (HTML-Kit) for
- HTML Tidy is now available for windows 95/98/NT. Alternatively,
- you can get tidy in its native form as a Windows console program:
- <a href="http://www.w3.org/People/Raggett/tidy.exe"><b>
- tidy.exe</b></a>, with the command options as per above.</p>
-
- <p><b><a href=
- "http://www.geocities.com/SiliconValley/1057/tidy.html">Mac
- users</a></b>! You can now run <a href=
- "http://www.geocities.com/SiliconValley/1057/tidy.html">HTML Tidy
- with FilterTop</a> (<a href=
- "http://www.geocities.com/SiliconValley/1057/images/TidyHTML.GIF">
- Screenshot</a>), or as a command line interface application. My
- thanks to <a href="mailto:teague@macbroker.com">Terry Teague</a>
- for this port.</p>
-
- <p><b><a href=
- "http://www.amiga.u-net.com/MadDogSoftware/Tidy.html">Amiga
- users</a></b>! Keith Blakemore-Noble has compiled Tidy for the
- Amiga.</p>
-
- <p><b><a href=
- "http://www-frec.bull.com/cgi-bin/list_dir.cgi/download/">AIX
- executable for Tidy</a></b>! Compiled by Ciaran Deignan. The link
- is to a general download page. The executable is available for
- AIX 4.3.2 and later.</p>
-
- <h3><a name="quotes">Integrating Tidy as part of other
- Software</a></h3>
-
- <p>You can also incorporate Tidy as part of a larger program, for
- instance in HTML editors or HTML transformation tools used for
- import filters, or for when you want to customize Web content
- to get the best out of different kinds of browsers. Imagine
- authoring clean HTML with CSS and at a touch of a button
- producing variants that look great and work reliably on a large
- variety of different browsers, taking into account the quirks of
- each. For instance, providing the ability to tune content for
- different versions of Netscape and Internet Explorer, and for
- browsers running on set-top boxes for televisions, handheld and
- palmtop devices, cellphones, and voice browsers. I am happy to
- quote for software development for such tools.</p>
-
- <h3><a name="implementation">Implementation details</a></h3>
-
- <p>The code is in ANSI C and uses the C standard library for i/o.
- The parser is thread-safe although the code for pretty printing
- the parse tree is not (yet). The parser works top down, building
- a complete parse tree in memory. Document text is held as Unicode
- represented as UTF-8 in a character buffer that expands as
- needed. The code has so far been tested on Windows'95, Windows'98,
- Windows NT, Linux, FreeBSD, NetBSD, Ultrix, OSF, OS/MP, IRIX,
- NeXtStep, MacOS, BeOS, OS2, AIX, Amiga, SunOS, Solaris, IRIX and
- HP-UX, amongst others.</p>
-
- <dl>
- <dt><a href="../tidy15apr99.tgz">tidy15apr99.tgz</a></dt>
-
- <dd>gzipped tar file for source code (Unix line ends)</dd>
-
- <dt><a href="../tidy15apr99.zip">tidy15apr99.zip</a></dt>
-
- <dd>zipped source code (Windows line ends)</dd>
-
- <dt><a href="http://www.w3.org/People/Raggett/tidy.exe">
- tidy.exe</a></dt>
-
- <dd>Windows 95/NT executable (32-bit Windows console-mode
- program)</dd>
-
- <dt><a href=
- "http://www.w3.org/People/Raggett/tidy17dec98.ppc.tgz">
- tidy17dec98.ppc.tgz</a></dt>
-
- <dd>Gzipped archive of the binary for BeOS PPC R4. It also
- contains complete tidy distribution and Makefile.BeOS file for
- BeOS (from 17dec98 release of tidy).</dd>
-
- <dt><a href=
- "http://www.dd.iij4u.or.jp/~kshimz/warp/tidy/tidy.zip">Tidy on
- OS/2</a></dt>
-
- <dd>Zipped archive of the OS/2 release of tidy, as compiled by
- Kaz SHiMZ <<a href=
- "mailto:kshimz@sfc.co.jp">kshimz@sfc.co.jp</a>></dd>
-
- <dt><a href="platform.h">platform.h</a>, <a href="html.h">
- html.h</a></dt>
-
- <dd>the include files with common definitions</dd>
-
- <dt><a href="config.c">config.c</a></dt>
-
- <dd>support for customizing Tidy via config files</dd>
-
- <dt><a href="lexer.c">lexer.c</a></dt>
-
- <dd>lexical analysis and buffer management</dd>
-
- <dt><a href="parser.c">parser.c</a></dt>
-
- <dd>HTML and XML parsers</dd>
-
- <dt><a href="tags.c">tags.c</a></dt>
-
- <dd>dictionary of tags and their properties</dd>
-
- <dt><a href="attrs.c">attrs.c</a></dt>
-
- <dd>dictionary of attributes and their properties</dd>
-
- <dt><a href="istack.c">istack.c</a></dt>
-
- <dd>stack of active inline elements</dd>
-
- <dt><a href="entities.c">entities.c</a></dt>
-
- <dd>dictionary of entities</dd>
-
- <dt><a href="clean.c">clean.c</a></dt>
-
- <dd>smarts for cleaning up presentational markup</dd>
-
- <dt><a href="pprint.c">pprint.c</a></dt>
-
- <dd>pretty printing for HTML and XML</dd>
-
- <dt><a href="localize.c">localize.c</a></dt>
-
- <dd>Change this file to localize tidy's messages</dd>
-
- <dt><a href="tidy.c">tidy.c</a></dt>
-
- <dd>main() and error reporting routines</dd>
-
- <dt><a href="Makefile">Makefile</a></dt>
-
- <dd>Makefile for gcc</dd>
- </dl>
-
- <p>Conventions for whether lines end with CRLF, LF or CR vary
- from one system to another. I have included the C source for a
- utility <b>tab2space</b> which can be used to ensure that files
- use the line end convention of your choice, and to expand tabs to
- spaces.</p>
-
- <pre>
- tab2space -t4 -unix *.h *.c
- tab2space -tabs -unix Makefile
- </pre>
-
- <p>Note use of "-tabs" to ensure that tabs are preserved in the
- Makefile (it won't work without them!).</p>
-
- <p>For those of you on Unix, here is a script you can use to
- strip carriage returns:</p>
-
- <pre>
- #!/bin/sh
- echo Stripping Carriage Returns from files...
- for i
- do
- # If a writable file
- if [ -f $i ]
- then
- if [ -w $i ]
- then
- echo $i
- # strip CRs from input and output to temp file
- tr -d '\015' < $i > toix.tmp
- mv toix.tmp $i
- else
- echo $i: write-protected
- fi
- else
- echo $i: not a file
- fi
- done
- </pre>
-
- <p>Save this script to a file, e.g. "<em>scripcr</em>" and use
- "<em>chmod +x stripcr</em>" to make it executable. You can then
- run it as "<em>stripcr *.c *.h Overview.html Makefile</em>"</p>
-
- <h2><a name="acks">Acknowledgements</a></h2>
-
- <p>I would like to thank the many people who have written to me
- with suggestions for improvements or reporting bugs. Your help
- has been invaluable.</p>
-
- <blockquote class="people">Drew Adams, Jacob Sparre Andersen,
- Osma Ahvenlampi, Joe D'Andrea, Jerry Andrews, Chang Hyun Baek,
- Chuck Baslock, Christer Bernerus, Keith Blakemore-Noble, Eric
- Blossom, David Brooke, Andy Brown, Keith B. Brown, Andreas
- Buchholz, Maurice Buxton, Jelks Cabaniss, Trevor Carden, Terry
- Cassidy, Mathew Cepl, Kendall Clark, Jeremy Clulow, Dan Connolly,
- Keith Davies, Claus AndrΘ FΣrber, Stephanie Foott, Rene Fritz,
- Francisco Guardiola David Getchell, Michael Giroux, Guus Goos,
- LΘa Gris, Francisco Guardiola, Juha HΣiki÷, G. Ken Holman, Craig
- Horman, Jack Horsfield, Rick Jelliffe, Craig Johnson, Charles
- LaFountain, Steven Lobo, Zdenek Kabelac, Michael Kay, Johannes
- Koch, Rudy Kohut, Allan Kuchinsky, Nick Leverton, Dietmar Lippold,
- Gert-Jan C. Lokhorst, Anton Marsden, Shane McCarron, Ian McKellar,
- Chris Nappin, Ann Navarro, Allan Odgaard, Matt Oshry, Gerald
- Oskoboiny, Ernst Paalvast, Christian Pantel, Steven Pemberton,
- Xavier Plantefeve, Ross L. Richardson, Philip Riebold, Erik Rossen,
- Dan Rudman, Christian Ruetgers, Klaus Johannes Rusch, Eric Schindler,
- J. Schlauch, Christian Schⁿler, Jim Seymour, Kazuyoshi Shimizu,
- Geoff Sinclair, Jo Smith, Rafi Stern, Michael J. Suzio, Oren
- Tirosh, John Tobler, Stuart Updegrave, Charles A. Upsdell, Larry
- W. Virden, Daniel Vogelheim, Jez Wain, Paul Ward, Jeff
- Young</blockquote>
-
- <p><small><a href="http://www.w3.org/People/Raggett">Dave
- Raggett</a> <<a href="mailto:dsr@w3.org">dsr@w3.org</a>> is
- an engineer from <a href="http://www.hp.com/">Hewlett
- Packard</a>'s <a href="http://www.hpl.hp.co.uk">UK
- Laboratories</a>, and works on assignment to the World Wide Web
- Consortium, where he is the W3C lead for HTML, Math and Voice
- Browsers.</small></p>
- </body>
- </html>
-
-